Targeted Gene Metagenomic Data Analysis ◾ 301
amplicon-based metagenomic analysis because it can identify bacterial species. A region
or regions of the gene are amplified using PCR and the amplicon then is sequenced with
the high-throughput technologies. The reads are usually for the targeted gene but for sev-
eral species. The analysis is then focused on identifying the taxonomic groups and their
abundance in the sample. After quality control, features unique sequences representing
taxonomic groups are obtained either by clustering or denoising. There are three kinds of
clustering: de novo clustering, open-reference clustering, and closed-reference clustering.
Any of these clustering methods will generate OTUs or operational taxonomic units. On
the other hand, denoising attempts to remove base call errors and classification error and
it produces ASVs, which are unique features that represent species in the sample. There
are three common algorithms for denoising: DADA2, Deblur, and UNOISE2. The most
commonly used program for amplicon-based metagenomic data analysis is QIIME2,
which implements both clustering methods and denoising methods. To analyze data with
QIMME2, raw data must be imported into QIIME2 artifacts. Several analyses can be con-
ducted with QIIME2 including taxonomic group identification and abundance, phyloge-
netic analysis, and diversity analysis.
REFERENCES
1. Coughlan L, Cotter P, Hill C, Alvarez-Ordóñez A: Biotechnological applications of functional
metagenomics in the food and pharmaceutical industries. Front Microbiol 2015, 6.
2. Schwartsmann G, Brondani da Rocha A, Berlinck RG, Jimeno J: Marine organisms as a source
of new anticancer agents. Lancet Oncol 2001, 2(4):221–225.
3. Xiong ZQ, Wang JF, Hao YY, Wang Y: Recent advances in the discovery and development of
marine microbial natural products. Mar Drugs 2013, 11(3):700–717.
4. Sun Z, Li J, Dai Y, Wang W, Shi R, Wang Z, Ding P, Lu Q, Jiang H, Pei W et al: Indigo Naturalis
Alleviates Dextran Sulfate Sodium-Induced Colitis in Rats via Altering Gut Microbiota. Front
Microbiol 2020, 11: 731.
5. Blaxter M, Mann J, Chapman T, Thomas F, Whitton C, Floyd R, Abebe E: Defining opera-
tional taxonomic units using DNA barcode data. Philos Trans R Soc Lond B Biol Sci 2005,
360(1462):1935–1943.
6. Westcott SL, Schloss PD: De novo clustering methods outperform reference-based meth-
ods for assigning 16S rRNA gene sequences to operational taxonomic units. PeerJ 2015,
3:e1487.
7. Rideout JR, He Y, Navas-Molina JA, Walters WA, Ursell LK, Gibbons SM, Chase J, McDonald
D, Gonzalez A, Robbins-Pianka A et al: Subsampled open-reference clustering creates
consistent, comprehensive OTU definitions and scales to billions of sequences. PeerJ 2014,
2:e545.
8. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP: DADA2:
High-resolution sample inference from Illumina amplicon data. Nat Methods 2016,
13(7):581–583.
9. Nearing JT, Douglas GM, Comeau AM, Langille MGI: Denoising the Denoisers: an
independent evaluation of microbiome sequence error-correction approaches. PeerJ 2018,
6:e5364.
10. Edgar RC: UNOISE2: improved error-correction for Illumina 16S and ITS amplicon
sequencing. bioRxiv 2016:081257.
11. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol
Biol 1990, 215(3):403–410.